Segmentation, Content Extraction and Visualization of Broadcast News Video Using Multistream Analysis
نویسندگان
چکیده
This paper reports the development of a broadcast news video corpora and novel techniques to automatically segment stories, extract proper names, and visualize associated metadata. We report story segmentation and proper name xtraction results using an information retrieval inspired evaluation methodology, measuring the precision and recall performance of our techniques. We briefly describe our implementation of a Broadcast News Analysis (BNA ’~) system and an associated viewer, Broadcast News Navigator (BNNTM). We point to current efforts toward more robust processing using multistream analysis on imagery, audio, and closed-caption streams and future efforts in automatic video summarization and user-tailored presentation generation. 1. Problem and Related Research Content based video access is a valuable capability for several important applications including video teleconference archiving, video mail access, and individualized video news program generation. The current state of the art for commercial video archives focuses on manual annotation (Davis 1991), which suffers from problems of accuracy, consistency (when performed by more than one operator), timeliness, scalability, and cost. Just as information retrieval techniques are required to manage large text collections, video sources require similar indexing and storage facilities to support real-time profiling as well as retrospective search. Within the imagery stream, techniques performing in the ninety plus percent accuracy range have been developed to index video based on visual transitions (e.g., dissolve, fade, cut) and shot classification (e.g., anchor versus story shots (Zhang et al. 1994). Others have investigated linguistic streams associated with video (e.g., closed captions, transcripts), indexing keywords and associated video keyframes to create static and hypertext depictions of television news (Bender & Chesnais 1988, Shahraray & Gibbon 1995). Unfortunately, inverted indices of keywords (even when supplemented with linguistic processing to address complexities such as synonymy, polysemy, and coreference) will only support more traditional information retrieval tasks as opposed to segmentation (required for higher level browsing), information extraction, and summarization. More complex linguistic processing is reported by (Taniguchi et al. 1995), who use Japanese topic markers such as "ex ni tsuite" and "wa" ("with regard to", "as for"), subject/object markers, well as frequency measures to extract discourse structures from transcripts, which are then used to provide topicoriented video browsers. It has become increasingly evident that more sophisticated single and multistream analysis techniques will be required not only to improve accuracy but to support more fine grained access to content and to provide access to higher level structure (Aigraine 1995). (Brown al. 1995), for example, provide content based access to video using a large scale, continuous speech recognition system to transcribe associated audio. In the Informedia TM project, (Hauptmann & Smith 1995) perform a series multistream analyses including color histogram changes, optical flow analysis, and speech transcription (using CMU’s Sphynx-II System). Similarly, we have found need to correlate events such as subject changes and speaker changes to improve the accuracy of indexing and retrieval. In (Mani et al. 1996), we report our initial efforts to segment news video using both anchor/reporter and topic shifts identified in closed-caption text. In this paper, we report our more recent results, which include the correlation of multiple streams of analysis to improve story segmentation performance, the extraction of facts from the linguistic stream, and the visualization of extracted information to identify trends and patterns in the news. Through the creation of a manually annotated video corpora representing ground truth and an associated set of evaluation metrics and methods, we are able to report statistical performance measures of our algorithms. 102 From: AAAI Technical Report SS-97-03. Compilation copyright © 1997, AAAI (www.aaai.org). All rights reserved.
منابع مشابه
United States Patent Hull et al
and pp. 5/1-5/2. Langley "An Analysis of Bayesian Classifiers," Proceedings of the Tenth National Conference on Artificial Intelligence, pp. 223-228, 1992. Langley "Induction of Selective Bayesian Classifiers," Proceedings of the Tenth National Conference on Uncertainty in Artificial Intelligence, pp. 400-406, (1994). Li et al. "Automatic Text Detection and Tracking in Digital Video," IEEE Tran...
متن کاملVisual Analysis of Multimedia Data
One of the most important applications in visual analytics has been exploratory visual analysis of large collections of unstructured text documents. However, digital media, especially those on the Internet, are multimedia in content with text, images, video, and even sound together. Furthermore, there is an explosion of broadcast and other media, especially in third world countries. (In the Mid...
متن کاملBroadcast News Understanding and Navigation
The Broadcast News Editor (BNE) and Broadcast News Navigator (BNN) are fully implemented systems that exploit integrated image, speech, and language processing to support intelligent access to broadcast news video. This paper summarizes the integration of a range of AI techniques within these systems to provide intelligent segmentation, extraction, search, summarization, visualization, and pers...
متن کاملDiscourse Cues for Broadcast News Segmentation
This paper describes the design and application of time-enhanced, finite state models of discourse cues to the automated segmentation of broadcast news. We describe our analysis of a broadcast news corpus, the design of a discourse cue based story segmentor that builds upon information extraction techniques, and finally its computational implementation and evaluation in the Broadcast News Navig...
متن کاملIntegrating multi-modal content analysis and hyperbolic visualization for large-scale news video retrieval and exploration
In this paper, we have developed a novel scheme to achieve more effective analysis, retrieval and exploration of large-scale news video collections by performing multi-modal video content analysis and synchronization. First, automatic keyword extraction is performed on news closed captions and audio channels to detect the most interesting news topics (i.e., keywords for news topic interpretatio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002